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Abstract: The partial least squares (PLS) is a popular modeling technique commonly used 
in social sciences. The traditional PLS algorithm deals with variables measured on interval 
scales while data are often collected on ordinal scales: a reformulation of the algorithm, 
named ordinal PLS (OPLS), is introduced, which properly deals with ordinal variables. An 
application to customer satisfaction data and some simulations are also presented. The 
technique seems to perform better than the traditional PLS when the number of categories 
of the items in the questionnaire is small (4 or 5) which is typical in the most common 
practical situations. 

Keywords: Ordinal Variables, Partial Least Squares, Path Analysis, Polychoric Correlation 
Matrix, Structural Equation Models with Latent Variables 



1 Introduction 



The partial least squares (PLS) technique is largely used in socio-economic studies where 
path analysis is performed with reference to the so-called structural equation models with 
latent variables (SEM-LV). 

Furthermore, it often happens that data are measured on ordinal scales; a typical example 
concerns customer satisfaction surveys, where responses given to a questionnaire are on 
Likert type scales assuming a unique common finite set of possible categories. 

In several research and applied works, averages, linear transformations, covariances and 
Pearson correlation coefficients are conventionally computed also on the ordinal variables 
coming from surveys. This practice can be theoretically justified by invoking the pragmatic 
approach to statistical measurement (Hand 2009). Namely according to this approach 'the 



precise property being measured is defined simultaneously with the procedure for measuring 
it' (Hand 2012), so when defining a construct, e.g. the overall customer satisfaction, the 
measuring instrument is also defined, and 'in a sense this makes the scale type the choice of 
the researcher' (Hand 20091. 



A more traditional approach (see Stevens 1946 ) would require appropriate procedures to 



be adopted in order to handle manifest indicators of the ordinal type. Within the well-known 
covariance-based framework, several approaches are suggested in order to appropriately es- 



(1984 


I, IJoreskog 


(2005 


I and 


Bollen 


(1989 



make the assumption that to each manifest indicator there corresponds an underlying con- 
tinuous latent variable, see Section [3] 
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Other approaches have been proposed to deal with ordinal variables within the Partial 
Least Squares (PLS) framework for SEM-LV: IJakobowicz & Derquenne (2007) base their 



procedure on the use of generalized linear models ^appo] (|2009| and [Lauro et al.| ( |2011 [) on 



Optimal Scaling and Alternating Least Squares; Russolillo & Lauro (2011) on the Hayashi 



first quantification method (Hayashi 1952). 



As observed by Russolillo & Lauro (2011 1 in the procedure by Jakobowicz & Derquenne 



( 2007 ) a value is assigned to the impact of each explanatory variable on each category of the 



response, while the researcher may be interested in the impact of each explanatory variable 
on the response as a whole. The same issue characterizes the techniques illustrated in the 



Chapter 5 by LohmoUer ( 1989 1. The present proposal goes in this direction: a reformulation 



of the PLS path modeling algorithm is introduced, see Section |6j allowing us to deal with 
variables of the ordinal type in a manner analogous to the covariance based procedures. 
In this way we recall the traditional psychometric approach, by applying a method for 



treating ordinal measures according to the well-known Thurstone ( 1959 ) scaling procedure 



that is assuming the presence of a continuous underlying variable for each ordinal indicator. 
The polychoric correlation matrix can be defined. We show that by using this matrix one 
can obtain parameter estimates also within the PLS framework. 

When the number of points of the scale is sufficiently high the value of the polychoric 
correlation between two variables is usually quite close to that of the Pearson correlation; in 
these situations there would be no need to have recourse to polychoric correlations and the 
traditional PLS algorithm may be applied. However, to make the response of interviewed 
people easier, it is common practice to administer questionnaires whose items are measured 
on at most 4 or 5 point alternatives: in these circumstances the proposed modification of 
the PLS algorithm seems to be appropriate. 

An application to customer satisfaction data and some simulations conclude the paper. 



2 The Structural Equation Model with latent variables 



A linear SEM-LV consists of two sets of equations (see BoUen 1989): the structural or 



inner model describing the path of the relationships among the latent variables, and the 
measurement or outer model, representing the relationships linking the latent variables, 
which cannot be directly observed, to appropriate corresponding manifest variables. 

The inner model 

The structural model is represented by the following linear relation 



(1) 



where tj is an (m x 1) vector of latent endogenous random variables (dependent variables); 
^ is an (n x 1) vector of latent exogenous random variables; C is an ("t- x 1) vector of 
error components, zero mean random variables. B and T are respectively (to x to) and 
(m X n) matrices containing the so-called structural parameters. In particular the matrix 
B contains information concerning the linear relationships among the latent endogenous 
variables: their elements represent the direct casual effect on each rji (i = 1, . . . ,to) of the 
remaining rjj [j ^ i). The matrix T contains the coefficients explaining the relationships 
among the latent exogenous and the endogenous variables: their elements represent the 
direct causal effects of the ^ components on the rji variables. 

When the matrix B is lower triangular or can be recast as lower triangular by changing the 



order of the elements in r] (which is possible if B has all zero eigenvalues (see e.g. Faliva 
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1992[ )) and '4' is diagonal, then the model ([T]) is said to be causal or of the recursive type, 
which excludes feedback effects. In the sequel we will assume B to be lower triangular. 



The outer model 



Dia- 



We consider only measurement models of the reflective type, which are adopted 
mantopoulos et al.| [2008) when the latent variables 'determine' their manifest indicators. 



and are defined according to the following linear relationships 

X = Ax$, + ex 
y = Ayr] + Sy 



(2) 
(3) 



where the vector random variables x (q x 1) and y {p x 1) represent the indicators for 
the latent variables ^ and tj, respectively. Each latent construct rji in rj and S,i in ^ is 
characterized by a set of indicators, Yih, h = 1, . . . ,pi for iji and Xih, h = 1, . . . , for 

The assumption of uncorrelation among the error components and uncorrelation between 
the errors and the independent variables in relationships ([l])-([3| is made. All latent variables 
and errors are usually assumed to be multivariate distributed according to a Normal random 
variable. 

Measurement models of the formative and of the MIMIC type may also be defined (see 



Esposito Vinzi et al. 



2010 ) but are not considered here. 

are measured on ordinal scales, 



It often happens that the indicators Yih,Xih are measured on ordinal scales, e.g. the 
responses given by the respondents to a questionnaire are on Likert type scales that assume 
a unique common finite set of possible categories. In this instance appropriate procedures 
are adopted for parameter estimation in SEM-LV in order to treat manifest indicators of 



the ordinal type. In Muthen (19841, Joreskog (20051 Bollen (1989) and Bollen fc Maydeu 



Olivares (2007) estimation procedures within a covariance-based framework are presented, 
which are based on the assumption that for each manifest indicator there corresponds a 
further underlying continuous latent variable, whose definition is here described in Section 

m 

The PLS specification 

Observe that the structural relationship ([I]) can be re-written in matrix notation as 



(4) 
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and the reflective measurement model ([2])-([3| as 



X 




■ Ax 
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In the initial works on PLS (LohmoUer 1989 Wold 1985) the same notation is used for 



endogenous and exogenous entities; thus the above relationships are re-written as 



by having (re-)defined 



Y = 



V 



D 
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r 
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B 



V = 
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C 



BY 
AY 



- e 



(5) 
(6) 



X = 
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y 



A = 



Ax 






Ay 
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The sub-matrix B, corresponding to the matrix B which appears in Q, is assumed to be 
lower triangular. The so-called Wold predictor specification, E{Q\rii, . . . ,rii-i) — Q,i = 
1, . . . , m, is made, giving rise to structural equation models of the causal type. 
The measurement model of the reflective type is named 'Mode A' in the PLS terminology. 

3 Assumptions on the genesis of ordinal categorical observed vari- 
ables 



The set of responses are assumed to be expressed on a conventional ordinal scale. This type 
of scale requires, according to the classical psychometric approach, appropriate methods to 
be applied. Here we propose to adopt a traditional psychometric approach, by considering 
the existence of an underlying continuous unobservable latent variable for each observed 
ordinal manifest variable. 

Following the PLS notation ^ and ^ for structural equation models with latent vari- 
ables, the set of responses gives rise to a fC-dimensional random categorical variable, say 
X = (Xi, . . . ^ Xii)' , whose components may assume, for simplicity, the same / ordered 
categories, denoted by the conventional integer values i = 1, . . . ,/. K is the dimension of 
X in ([6| , corresponding to g -|- p in the sum of the dimensions of x and y. 

Let P{Xk — i) = pki, with ^l^iPki — 1, V/c, be the corresponding marginal probabilities 
and let 

j<i 

be the probability of observing a conventional value Xk for Xk not greater than i. Fur- 
thermore assume that to each categorical variable Xk there corresponds an unobservable 
latent variable X^ , which is represented on an interval scale with a continuous distribution 
function ^kixl). The distribution for the continuous if-dimensional latent random variable 
X* = {X^, . . . , X'^) is usually assumed to be multinormal. Each observed ordinal indicator 
Xk, k — 1, . . . ,K, is related to the corresponding latent continuous X^ by means of a non 



Bollen 


1989 


Joreskog 


2005 


Muthen 


1984 



Xk 



1 

2 



if X;<ak,i 

if ak,i < Xl< ak,2 



Ik~l if 
h if 



-2 < < akj^ 
-1 < X*, 



(8) 



-1 



where ak^i, ■ ■ ■ ,akjk-i are marginal threshold values defined as Uk^i = ^~^{Fk{i)),i = 
1, . . . , 7fc — 1, being $(•) the cumulative distribution function of a specific random variable, 
usually the standard Normal, Joreskog (2005); Ik < / denotes the number of categories 



effectively used by the respondents 
one respondent. 

We also set akfi = —4 and a^, 
lower than —4 or larger than 4. 



Ik — I when each category has been chosen by at least 
4 and set to —4 or 4 threshold values respectively 
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4 Appropriate covariance matrix in presence of ordinal categorical 
variables 



We remember that covariance-based estimation procedures look for the parameter values 
minimizing the distance between the covariance matrix of the manifest variables, specified as 
a function of the parameters, and its sample counterpart. Since, in case of ordinal variables 
it is not possible to compute the covariance matrix, we have recourse to the polychoric 



correlation matrix or the polychoric covariance matrix (see Bollen 



1989^ 



For two generic ordinal categorical variables Xh and Xk, h,k G {1, . . . , K}, the polych 



Drasgow 


1986 


Olsson 


1979 



loglikelihood typically conditional on the marginal threshold estimates 



ih Ik 

i=i j=i 



; ln(7ry). 



where riij is the number of observations for the categories ith of Xh and jth of Xk , 



being <i>2(') the standard bivariate Normal distribution function with correlation p condi- 
tional on the threshold values, ah,i,akj, for X^ and X^, respectively estimated by having 
recourse e.g. to the two marginal latent standard Normal variates according to the usual 
two step computation, ak.o = — oo and Ukj^ = -t-oo. 

Later on we will assume that Ik = I, that is each category has been chosen by at least one 
respondent, possibly substituting a negligible quantity, e.g. e = 0.5, to the zero n^jS. 

By considering the polychoric coefficients for each pairs X^ and Xk, h,k G {1, ■ ■ . , K} we 
can obtain the polychoric correlation matrix (and also the covariance matrix, if appropriate 
location and scale values are assigned to the underlying latent variables X^ { Joreskog 2005 )) , 
which, according to the covariance-based approach, is necessary for the parameter estimation 
of a structural model with manifest indicators of the ordinal type. 

In case of manifest variables of generic type appropriate correlations should be computed 
(see Drasgow [1986 ); in particular: a) polychoric correlation coefficients for pairs of ordinal 
variables, b) polyserial correlation coefficients between an ordinal variable and one defined 
on an interval or ratio scale, and c) Pearson correlation coeflticients between variables defined 
on interval or ratio scales. However we will assume, later on, that only variables on ordinal 
scales are present. 



5 Application within the PLS framework 

In presence of manifest indicators of the ordinal type, we suggest a slightly modified version 
of model ([5|-([6|, where the manifest variables X in relationship ^ are in a certain sense 
'replaced' by the underlying unobservable latent variables X* . 

We do not write explicitly the dependence between X and X*, since for the subject s = 
1, 2, . . . , TV the real point value x^.^ for each indicator X^ is not known: we only assume that 
it belongs to the interval defined by the threshold values in ([s]) having as image the observed 
category Xks- 

It will be possible to obtain point estimates for the parameters in D and A, while only 
estimates of the threshold values will be directly predicted with regard to the scores of the 
latent variables Yj, j — I, . . . ,n + m. 
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The PLS algorithm structure typically adopted to obtain the PLS estimates in presence 
of standardized latent variables is briefly presented. Mode A (reflective outer model) is 
considered for outer estimation of latent variables and the centroid scheme for the inner 



estimation (LohmoUer 1989 Tenenhaus et al. 2005). 



Some linear algebra restatement of the algorithm is necessary, which renders the usual 
procedure apt to be applied with minor changes also in presence of manifest variables of the 
ordinal type. 



5.1 The PLS algorithm 

The PLS procedure obtaining the estimates of the parameters in consists of a first 

iterative phase which produces the latent score estimates; subsequently the values of the 
vector 6, containing all the unknown parameters in the model (D, T etc.), are estimated, 
by applying the Ordinary Least Squares method to all the linear multiple regression sub- 
problems into which the inner and outer models are decomposed. 

Remember that the whole set of true latent variables (always measured as differences 
from their respective average values) is summarized by the vector 



being the first n elements of the exogenous type and the remaining m endogenous. Observe 
that, since we are in presence of models of the causal type, the generic endogenous variable 



m, may only depend on the exogenous variables Yi, . . . , 1^ and on a 



subset of its preceding endogenous latent variables. 

With reference to the inner model, a square matrix T, of order (n - 
structural relationships among latent variables may be defined: 



to), indicating the 



exogenous 



endogenous 



Yn 

Yn+1 



Yr 


































































































(9) 



The generic element tjk of T is given unit value if the endogenous Yj is directly linked to 
yjc; tjk is null otherwise. Then T may be defined as the indicator matrix of D in ([S]) by 
having set to the elements on the main diagonal. 

The PLS algorithm follows an iterative procedure, which defines, at the rth generic step, 
the scores of each latent variable Yj, according the so-called 'outer approximation', as a 
standardized linear combination of the manifest variables corresponding to Yj 



E(r) 

h=l 



(Xjh-Xjh), j = 1,2,... ,n + m 



(10) 



(r) 

with appropriate weights , 

In the PLS framework each latent variable is thus defined as a 'composite' ot 
indicators. 



,(r) 



, Wjp'^ summing up to 1 (see 



Lohmoller 



19891. 



its manifest 
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Step 0. The starting step of the algorithm uses an arbitrarily defined weighting system that, 
for the sake of simplicity, may be set to 

{^°) = l/p„ h = l,2,...,p,} J = 1,2,..., n + m. (11) 

The initial scores of the latent variables Yj, j = 1,2, ... ,n-\-m, are defined as linear combi- 
nations of the centered values of the corresponding manifest variables Xjh, h ~ 1,2, ... ,pj; 
thus for the generic subject s, s — 1,2, . . . , N , we have: 



Vis 



^wfl^ixjhs-Xjh) (12) 



h=l 



where xjh, h = 1,2, ... ,pj, denote the mean of the manifest variables Xjfi associated to the 
latent variable Yj. Observe that at step the weights sum up to 1 by definition. 
The scores are then standardized 

Vjs = Vjsfj (13) 



where 

N 

I I 



1 ^ 



\ s=l / 

is an estimate of the variance of Yj, being N the number of available observations. 
We then set 

Vjs^Vjs- (14) 
Iterative phases of the PLS algorithm 

Step 1. Define for each latent variable Yj an instrumental variable Zj as a linear combination 
of the estimates of the latent variables Yfe directly linked to Yj 

n+rri 

Zj = ^ Tj-fcYfe, (15) 
fc=i 

Tjk = uisiyi{tjk,tkj)iiign[Cov{Yj,Yk)] (16) 

(remember that tjk is the generic element of the matrix T, used to specify the relationships 
in the inner model; tjk = 1 if the latent variable Yj is connected with Yk in the path model 
representation, tjk — otherwise, see ^) and 

1 ^ 

Cov{Y^,Yk) ^ ^ ^ ijj.yks (17) 

s=l 



fc=l 

where 



having Yj zero mean. 

Step 2. In case of the so-called Mode A (reflective outer model), at every stage r of the 

(r 



iteration (r = 1, 2, . . .) update the vectors of the weights w'f'^ as 



(r) 



l> = ±C„J J2c,h, j = 1,2,... ,n + m, h = l,2,... ,p„ (18) 
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where 

N 



Cjh = Cov{Xjh, Zj) = '^{xjhs - Xjh){zjs - Zjh), (19) 

s=l 

^ N P, 

= TV ^ ^ " s*5"[Co^'(^j7i, y,)]), (20) 

s=l 

being 

1 ^ 1 ^ 

Cow(Xj7i, ^;,) = ^ " 2;j7.)(yj. - Vj) and J/j = ^ XI ^J'*' 

s=l s=l 

Step 5. Update the outer approximation: 



1 



= 1^ (^J'*'' - ^ih). (21) 

Z^/i=l ^j/i h=l 



and standardize as in (13)-(14) 



Looping. Loop step 1 to step 3 until the fohowing convergence stop criterion is attained 

1/2 

rn+n Pj ' 

where e is an appropriately chosen positive convergence tolerance value. 
Ending phase of the PLS algorithm 

Carry out the ordinary least squares estimation of the l3jk coefficients linking Yk to Yj 
(for every inner submodel), the Xjh parameters (outer models, Mode A), specifying the 
linear relations between the latent Yj and the corresponding manifest Xjh and the residual 
variances (having standardized the involved variables). 

6 An equivalent formulation of the PLS algorithm and its imple- 
mentation with ordinal variables 

We now rewrite the PLS algorithm, by making extensive use of linear algebra notations, 
in order to avoid the reconstruction, at each step, of the latent scores. The procedure will 
be based on the covariance matrix of the observed manifest variables Xjh or the polychoric 
correlation matrix in case of manifest variables of the ordinal type. Namely, in presence 
of ordinal indicators we substitute the categorical variables Xji^ with the underlying latent 
variables X*f^, see (|8j), that are standardized and thus centered. Note that the components 
of X* are not observable, but in the algorithm we will only make use of variances and 
covariances defined on their linear transformations. 

These variances and covariances can be derived as a function of Tixx, the covariance matrix 
of the vector random variable containing all the {p + q) manifest indicators Xjh, of the 
metric/interval type, or their counterparts X*f^ when the indicators are ordinal; in the 
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latter case Sxx is the polychoric correlation matrix defined across the ordered categorical 
variables. 



Step 0. The outer approximation for the generic variable Yj is formally defined, see (10), as 

Pj Pj Pj 



h=l 



h=l 



Later on we will omit the symbol, here specifying the iteration step of the algorithm. 



Relationship ( 10 1 may be written in matrix form as 



Yj = [O, . . .,Wji, . . .,Wjp-,0, ... ,0] 



w' cX 



(22) 



C^ll 



C^JP, 



cX(n+m)l 



where are the centered manifest variables that, in presence of ordinal indicators, are set 
to cX = X*. 



It is now possible to define the N x {m + n) matrix Y ~ 



ra+n 



containing the 



outer approximation values of the latent variables for the N subjects as 

Y= c-XW = cX [wi, . . . , v^fj, . . . , w„+,„] , 

being cX the TVx {p+q) matrix of the deviations of the manifest variables from their means, 
and W = [wi, . . . , Wj, . . . , yVn+m] the square matrix containing the vectors Wj as columns. 
Thus the covariance (17 1 between Yj and Yk can be expressed as 



Cov [Y,,Yk) = w^S ( cX cX') Wfe = w^SxxWfe, 
and the variance covariance matrix of the random vector [Yi , . . . , i^n+m ) as 



■'YY 



(23) 



(24) 



The standardized version ( 13 ) of Y is 

Y = Y(S^y = (cX)W{[W'SxxW]*I}"'/' = (cX) 5W 

where * is the Hadamard element by element product, I the identity matrix, and 

5W = W{[W'SjfxW] 



(25) 
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is a transformation of the original weights W, which for eaeh group of manifest indicators 
sum up to 1, into a set of weights allowing the latent variables to be on a standardized scale. 
We then set 

Y = Y. 

The covariance matrix across standardized (Yi, . . . ,Yn+rn^ can be re-defined as 

i:yy = Fyy = sW'SjfX (26) 

so becoming a correlation matrix. 
Iterative phases of the PLS algorithm 



Step 1. The instrumental variables Zj, see (15), which are defined for each latent variable 



Yj as a linear combination of the estimates of the latent variables Yfc linked to Yj in the path 
diagram may be expressed as 

cXii 

C^lpi 



Y^ 



= [0,...,0,...,1,..., 0,1,0] 



cX = rj sW cX 



(27) 



where Tj is the jth column of the matrix T with generic element Tjk defined, according to 



(15), as 



T = (T + T')*s^.gn(S^^) 



(28) 



being the elements of T equal to or 1. 

With reference to the matrix of observable data the x (m + n) matrix Z containing 
the values of the (n + m) instrumental variables Zi, . . . , Zn+m for the N subjects may be 
obtained as 

Z = [Zi, . . . , Zj, . . . , 7in+m] = cXgWT. 



Step 2. The covariance ( 19 ) between Xjh and Zj is defined as 







Yi 


Cov{Xjh,Zj) — Cov 


Xjh,Tj 
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Gov 



sw: 



V 

C ^ {n+ra)pn-\-n 



(29) 



and Tixz the covariance matrix between all the manifest variables Xjh and the instrumental 
variables Zj as 

sWT. (30) 

The covariance between Xj^ and is 

(31) 

and Sxy the covariance matrix between all the manifest variables Xjh and the composites 
Yj can be obtained as 

S^^ = J^xx sW. (32) 

Now define a {p + q) x (n + m) matrix C by the Hadamard product of the indicator matrix 
Xw of the matrix W and the covariance matrix between X and Z 







cXji 








Gov {Xjh,Yj^ = Gov 


Xjh, s'^'j 


. cXjp^ _ 






(J V y 

j Pj 3 ^ 



(33) 
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it results in a block diagonal matrix with generic block 



,m + n. 



The matrix W with the updated weights is obtained from 

W = C[diag{l'j,+qC)]-^diag{±), 



(34) 



where 1 is the (p + (7) x 1 unitary vector, diag{-) is the operator transforming a vector in a 
diagonal matrix and ± is the vector defined as 



± = sign {I'p^g [sign (xwS^^)] } 



(35) 



Finally transformation (25) may be applied to obtain the standardizing weights sW. 



We resume the sequence of steps defining the reformulation of the PLS algorithm, which 
has the characteristic of avoiding the determination, at each step, of the composites scores 
yjS and of the instrumental variables scores ZjS. Ending phases of the PLS algorithm will 
be described later (see Sections 6.1 and 6.2). 



'n+mj 



Compute ^xx (in case of ordinal items the polychoric correlation matrix) 
Define the starting weights W = [wi , . . . , Wj , . 
Iterative phase 

Set Wtemp = W 
Compute; 

W'SxxW see (124) 



■'YY 



jW = W {[W'Sxjf W] * I}^^/^ = W [E^^ * I] 



-1/2 



see (25) 



llyy = Pyy = S^' S X X 

T = (T + T') * sign{-Eyy) 
^xz = ^xx sWr see mh 



see (26) 



see (28) 



see (32) 



C = xw * ^xz sec (133|) 

± = sign{l'p^g [sign (xw'^xy)] } 

Update the weights W — C[diag{l'p_^_gC) 

Obtain sW see ([251 



see (35) 



-^diag{±) see (34) 



Check if ||W- Wtemp 1 1 <£ 

6.1 Ending phase of the PLS algorithm with manifest variables of 
the interval type 

After convergence of the weights in W the score values can be determined as 
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and OLS regressions carried out (on standardized variables) to obtain the estimates of the 
parameters in D and A and the variances of the error components as the residual variances 
of the corresponding regression models. 



6.2 Parameter Estimation of the inner and outer relationships in 
presence of ordinal manifest variables 

The estimates in the inner and outer regression models can also be obtained without having 
to reconstruct the score values yjg , that cannot be estimated in presence of manifest variables 
of the ordinal type: their prediction may be obtained according to one of the procedures 
illustrated in Section l673l 

The parameter estimates make reference to the following linear regression models, defined 
on standardized variables, 



^0 jPi Rj^i 



(36) 



Yd] is a subset of {Yi, . . . , Y,_i} defined according to the 



where d < j and { _rj Yi, . . . , 
jth. equation in 

The estimate of the vector j(3 = [ j/^i, • ■ • , jPd]' , which contains the unknown elements 
of the m + j row of matrix D in ([5| , may be computed as 

j0 — RiS,-,,-, iS< 



(37) 

Pyy, the correlation 

matrix of Y, see (|26|), the rows and columns pertaining to the independent variables in the 



where njHyY matrix obtained by extracting from — x yy. 



linear model (36 1 and j^yy is the vector obtained by extracting from the jth column of 



^YY 



the elements corresponding to correlations between Yj and its covariates, according to 



relationship ( 36 1 



Let now Yj^y be the correlation matrix between the manifest indicators X and the 



composites Y, which can be derived from (32). The estimate of parameter Xjh in the outer 
model 

X-jh = >^jhYj + Sjh, j = l,...,m + n,h=l,...,pj 



is given by the correlation coefficient between Xjh and Yj. 

The ending phase of the PLS algorithm, in presence of manifest variables of the ordinal 
type, can be then resumed in the following way: 



Rj^YY 3 YY 



Xjh is equal to the correlation coefficient between Xjh and Yj. 

In presence of manifest variables of the interval type the procedure gives, by having recourse 
to Pearson's correlations, the same results as the ending phase of the usual PLS algorithm 



presented in Section 6.1 



Only the 'covariance' or 'correlation' matrix of the manifest variables X is needed in 
order to determine the final weights and the inner and outer model parameter estimates. In 
presence of manifest indicators of the ordinal type we propose to use the polychoric correla- 
tion matrix. This is consistent with the so-called METRIC 1 option suggested by |Lohmoller| 
( 1989 ) (see also Rigdon 2012 Tenenhaus et al. 2005 1 performing the standardization of all 



manifest indicators. 
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Both polychoric covariance and correlation matrices must be invertible for the above 
procedure to work. However, constrained algorithms do exist whenever invertibility problems 
should arise for the polychoric correlation matrix. (For example the function hetcor, available 
in the R polycor package, makes it possible to require that the computed matrix of pairwise 
polychoric, polyserial, and Pearson correlations be positive-definite). 

After having transformed the manifest variables according to ([s]) the threshold values 
related to the standard normal variables X^, . . . are available. In this case we have 

Sx*jf* = Px»x*, that is the polychoric covariance matrix between the underlying latent 
variables coincides with the polychoric correlation matrix. 

The algorithm we have presented for ordinal manifest variables will be denoted as Ordinal 
Partial Least Squares (OPLS) from now on. 



6.3 Prediction of Latent Scores 

A point estimate of the composite continuous (latent) Yj cannot be determined in presence 
of ordinal variables for the generic subject; we can only establish an interval of possible 
values conditional on the threshold values pertaining the latent variables X*^ that underlie 
each ordinal manifest variable. 

Since each underlying X*^ variable is assumed to be a standard Normal variate, the 

composite variable Yj, defined by the outer approximation 

will also be distributed according to a standard Normal variate. 

A set of threshold values 0^^,1 = 1,...,/— 1, can be derived from the threshold values 



a. 



referred to the variables ^*^, /i = 1, . . . being / the common number of categories 



assumed by the variables Xjh, as 

Pi 

h=l 

Should some threshold values equal ±oo they have to be replaced with ±4. Later we will 

Y Y 

also consider — —A and a^^ =4. 

For the generic subject s expressing the values Xjhs for the variables Xjh-, h = 1, . . . ,pj, 
linked to the generic Y, , let us first define the sets of y values images of all possible Xjhs- 

In case subject s chooses the same category i for all the manifest indicators of Yj , that 
\s Xjis — . . . — Xjp-s — i with «€{!,...,/}, the image will be of the type 

which we will call 'homogeneous thresholds'. 

Otherwise, see Fig. [ij the set which is the image of all possible responses Xjhs, will not 
correspond exactly to one subset Ai. Let us denote this set with 

Q.^(ar^/3r^■] 

where: 

Pj Pj 

Yi \ ^ Xjhs J oYi \ ^ Xjhs 

a/ = 2^ swjha^ii and = 2^ SWjha^ , 

h=l h=l 
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3l , ^2 



33 34 



«s Mode estimation 



Figure 1: Latent variable category assignment, see (38) 



being a^iL''{ and a^^'" the threshold values corresponding to the category Xjhs observed by 
subject s, that is the values defining the interval for X*^ to have xjhs as image according to 
§. 

In order to assign a category to subject s for the latent variable Yj we can use one of the 
following options: 

1. Mode estimation. Compute, see Fig. [ij the probabilities for Cjs to overlap each set 
Ai defined by the 'homogeneous thresholds' 

P(Cj,nA,) 1 = 1,..., I (38) 

and, for subject s, select the set Ai with maximum probability. To the set Ai cor- 
responds the assignment of category i as a score estimate for the latent variable Yj. 

2. Median estimation. Compute the median of the variable Yj over the interval Cjs 

Median{Y, \Y, e Q,) = $-i Q (a>(ar^ ) + ^(3^^)^ (39) 

the category i pertaining the set Ai to which Median{Yj\Yj e Cjs) belongs, is assigned 
to subject s. 
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3. Mean estimation. Compute the mean of the variable Yj over the interval Cjs 

I, Yv 



(40) 



the category i pertaining the set Ai to which £^(Yj |y, e Cjs) belongs, is assigned to 
subject s. 



6.4 Bias effects of the OPLS algorithm 



Schneeweiss ( 1993 1 shows that parameter estimates obtained with the PLS algorithm are 



negatively biased for the inner model, (these estimates are related to the covariances or 
correlations across latent variables). The OPLS is based on the analysis of the polychoric 
correlation matrix, which is obtained by maximizing the correlation across the latent vari- 
ables that generate, according to (|8]), the manifest variables. Especially in presence of items 
with a low number of categories, polychoric correlations are usually larger than the Pear- 



son's ones as was also observed by Coenders et al. (1997). Thus we may expect that the 



distribution of the inner model parameter estimates obtained with OPLS dominates stochas- 
tically that of the PLS algorithm, possibly reducing the negative bias of estimates based on 
Pearson's correlations. 

However, the reduction in the bias of the inner model parameter estimates for OPLS can 
have a drawback: a positive bias in the estimation of the parameters in the outer model. 



Fornell & Cha (1994) report, for the special case of identical correlation coefficients (say 



p) across all the manifest variables, the following relationship relating the bias of the PLS 
algorithm with respect to maximum likelihood estimates of the outer model A parameters 
and the one referred to the common correlation across latent variables, upon which the inner 
model coefHcients /3 are obtained: 



bias(A) 



1 



v/bias(p) 



An high value for bias(A) corresponds to a low value of bias(p); we have observed this issue 
in the illustration presented in Section [7. 2[ 

Anyway, we have to observe that outer model parameters are not the most important 
target in a decision making procedure based on the PLS estimation of a structural equation 
model with latent variables: the main role is played by the inner model parameter estimates 



and by the weights w^^ see (10) and (22), defining each PLS latent variable as a 'composite' 



that is a linear combination of its related manifest variables. The largest weights are related 
to the manifest variables which are supposed to have greater influence in driving the 'com- 
posite'; moreover, since the weights sum up to 1 they should not suffer of any dimensional 
bias problem. 



6.5 Assessing rehabihty 

Scale reliability can be assessed, for ordinal scales, by having recourse to methods based 



on the polychoric correlation matrix (see Gadermann et al. 2012 Zumbo et al. 2007) for 
Cronbach's a. The Dillon-Goldstein's rho ( Chin 1998 ) and methods presented for covariance 



based models ( Green fc Yang[ |2009 Raykov 2002), which make reference to all relationships 



in the structural equation model ([!])- ([3|), can also be implemented. 
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Figure 2: Path diagram for the mobile phone industry customer satisfaction model 



7 Illustrative examples 



The Partial Least Squares algorithm has been successfully applied to estimate models aimed 



at measuring customer satisfaction, first at a national level ( Fornell 



and then also in a business context ( Johnson & Gustafsson 2000 



1996 Fornell etal. 1996) 



Johnson et al. 2001). A 



widespread literature on this field is available. 



The OPLS methodology is here implemented in R (R Core Team 2012) and applied to 



a well-known data set describing the measure of customer satisfaction in the mobile phone 
industry (Bayol et al. 2000 Tenenhaus et al. 2005). By means of this example we compare 



the behaviour of the PLS and OPLS in presence of a traditional questionnaire whose items 
are characterized by a high number of categories (say 10). 

Some simulations are then reported to analyze the behaviour of the procedure when the 
number of points for each item is reduced. 



The R procedures by Fox (2010) and Revelle (20121 are used to compute polychoric 



correlation matrices, with minor changes to allow polychoric correlations to be computed 
when the number of categories is larger than 8. We never needed the polychoric correlation 
matrix to be forced in order to comply with the positive definiteness condition. 



7.1 The Mobile Phone Data Set 



We applied the procedure to a classical example on mobile phone, presented in |Bayol et al 
(2000) and Tenenhaus et al. (20051. Data (250 observations) are available e.g. in Sanchez & 



Trinchera (2012). Data were collected on 24 ordered categorical variables with 10 categories; 



the observed variables are resumed by 7 latent variables. 

The customer satisfaction model underlying the mobile phone data refers to a version of 
the European Customer Satisfaction Index with 1 exogenous latent variable, the Image (Yi) 
with 5 manifest indicators, and 6 endogenous latent variables: Customer Expectations (I2), 
Perceived Quality (Y3), Perceived Value (I4), Customer Satisfaction (Y^), Loyalty (Yg) and 
Complaints (I7), with respectively 3, 7, 2, 3, 1 and 3 manifest indicators. See Fig. [2] for the 



inner path model relationships. Table 1 in Tenenhaus et al. ( 2005[ ) contains the structure 



of the questionnaire; it can be considered as a possible instrument for customer satisfaction 
measurement in the mobile phone industry. 

Table [1] reports the parameter estimates obtained both with the standard PLS algorithm 
and with the OPLS algorithm. 

Surprisingly, results are quite similar but not so close. When the number of categories 
is sufficiently high, Pearson correlation coefficients are good approximations for their poly- 
choric counterparts, but responses in customer satisfaction surveys do usually have skewed 
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Table 1: Mobile phone industry customer satisfaction model: PLS and OPLS parameter 
estimates with their significance (s.e. in brackets) 



PLS OPLS 



/321 


0.491 


(0.000) 


0.584 


(0.000) 


/332 


0.545 


(0.000) 


0.612 


(0.000) 


P42 


0.067 


(0.281) 


0.037 


(0.563) 


Pi3 


0.540 


(0.000) 


0.596 


(0.000) 


P51 


0.153 


(0.006) 


0.199 


(0.001) 


/352 


0.035 


(0.431) 


0.035 


(0.423) 


/353 


0.544 


(0.000) 


0.517 


(0.000) 


/354 


0.201 


(0.000) 


0.198 


(0.000) 


fts 


0.541 


(0.000) 


0.563 


(0.000) 




0.212 


(0.001) 


0.261 


(0.000) 




0.466 


(0.000) 


0.493 


(0.000) 


P76 


0.051 


(0.376) 


0.043 


(0.417) 



distributions since respondents do not effectively choose, with a non-negligible frequency, 
all the available categories of the manifest variables. 6 over 24 manifest indicators had only 
6 categories with at least 5 respondents. Thus differences between Pearson and polychoric 
correlations may be evident with effects on parameter estimates. 

Coefficients computed with the OPLS algorithm, that are significantly different from 0, are 
larger (except for /353 and P^i) than those obtained with the PLS algorithm which is known 
to underestimate the inner model parameters (see Schneeweiss , 1993 ) and it is also based 
on Pearson's correlations which underestimate real correlations when the ordinal manifest 
variables are measured on scales with a small number of categories. 

Figure |3] shows a comparison of the latent scores reconstructed with the two method- 
ologies. Information about Ig is not reported since the variable is identical to its unique 
manifest indicator. Recall that according to the PLS algorithm the scores are weighted av- 
erages of the values expressed by the subjects on the proxies; the scores are thus generated 
on an interval scale. With the OPLS algorithm latent variable scores can only be predicted 
according to one of the procedures presented in Section [6.3| and their values are on the same 
ordinal scale common to the proxy variables (the procedure 'Mode estimation' was adopted 
to produce the graph). 

Table [2] shows the degree of coherency of the latent scores obtained with the traditional PLS 
algorithm and the 3 procedures presented in Section [6. 3 1 for OPLS. Having rounded scores 
obtained with the PLS algorithm to integer values, percentages of exact concordance are 
reported on the first three lines, while in the remaining lines are percentages of concordance 
with a difference between rounded values not larger than 1. We have at least 70% exact 
concordance (except for latent variable I7); more than 90% of cases for all latent variables 
show a difference between rounded values lower than 1. 

The weights Wj, see relationships (18 1 or (34), play an important role in making decisions 
based on PLS estimation of structural equation model with latent variables. They establish 
which proxy variables drive a latent variable behaviour. Latent variables are defined as 
'composite' variables in the PLS algorithm, that is weighted averages of their manifest 
indicators with weights Wj . Figure |4] shows the comparison of the weights for the 7 latent 
variables obtained with the two algorithms. Points with the same number identify the 
weights assigned by the two algorithms to the manifest indicators of each latent variable 
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latent variable Y_1 : scores from PLS latent variable Y_2: scores from PLS latent variable Y_3: scores from PLS 



2 4 6 8 10 

latent variable Y_4: scores from PLS 



latent variable Y_5: scores from PLS 



latent variable Y_7: scores from PLS 



Figure 3: PLS and OPLS (Mode estimation) latent variable score comparison 



Table 2: Coherency of latent scores between PLS and OPLS 



Method 


n 


Y2 


Y3 


Yi 


Vs 


Y7 


Mode estimation" 


70.0 


71.6 


79.2 


84.4 


71.6 


49.2 


Median estimation" 


74.8 


75.2 


78.0 


88.0 


70.4 


51.2 


Mean estimation" 


72.8 


77.2 


75.6 


86.8 


71.6 


50.4 


Mode estimation' 


98.8 


98.0 


100.0 


99.6 


99.2 


89.6 


Median estimation'' 


99.2 


98.4 


100.0 


100.0 


99.2 


94.0 


Mean estimation*" 


99.2 


98.4 


100.0 


99.6 


99.6 


90.0 



" percentages of exact concordance after having rounded PLS scores to integer values 
percentages of concordance with a difference between rounded values not larger than 1 
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0.0 0.1 0.2 0.3 0.4 0.5 0.6 

weights from PLS 



Figure 4: PLS and OPLS weights comparison 

Yj,j = 1, 2, . . . , 5, 7 (values 6 do not appear since Yq has only one indicator with unitary 
weight). Dashed bandwidths include pairs which differ, with the two methodologies, no 
more than 0.05. Only 2 values over the 23 weights are outside the bandwidths. We can 
conclude that the two methods construct composite variables in quite the same manner. 
Dotted lines give information for each latent variable about the indicator with the largest 
weight determined by the two algorithms: except for latent variables Y2 and Y3 the same 
proxy variable is identified as the most important in explaining each latent construct. 

7.2 Some Simulations 

To compare the performance of the classical PLS algorithm with the OPLS for different 
number of points in the scale of manifest variables we considered some simulations from the 
following model 

Vi = 711C1 + Ci 

m = /321?7l + 7226 + 723?3 + C2 

V3 = I332V2 + Ca 

see Figure [5] Measurement models of the reflective type were assumed, with 3 manifest 
ordinal reflective indicators for each latent variable 

Xth = x^ih^i + £ih,i = 1, 2, 3, = 1, 2, 3 and Y^h = YhhVi + ^ih, i = 1, 2, 3, ft, = 1, 2, 3. 

In order to take into account the presence of asymmetric distributions, latent variables 
^i, i = 1, 2, 3, were generated, in separate simulations, according both to the standard Normal 
distribution for all variables and Beta distributions with parameters (a = 11, /? — 2) 
for ^1, [a = 16, /3 — 3) for ^2, (o; = 54, /3 — 7) for ^3 which were then standardized. 
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Figure 5: Path diagram for simulated models 



Theoretical asymmetry indices 71 = —0.9573, —0.7992 and —0.6043 correspond to the three 
Beta distributions. Values of the asymmetry indices for the mobile phone data analyzed 
in Section 7.1 are in the range (-1.07,-0.22) except for one variable showing positive 
asymmetry. The model parameters were fixed to 711 = 0.9,722 — 0.5,723 — 0.6, /32i — 0.5 
and /332 = 0.6. The A coefhcients of measurement models were set to 0.8, 0.9, 0.95. Both the 
variances of the error components Ci, * = 1, 2, 3 in the inner model and those pertaining errors 
in the measurement models were set to values ensuring the latent indicators r/i,i = 1,2,3 
and the manifest variables Xih,i = 1,2, /i = 1,2,3 and Yih,,i = 1,2,3, h = 1,2,3 to have 
unit variance. 

Manifest variables Xih and Yih were rescaled according to the following rules 



scaledXhi 

SCALEoYih 



Xih - min(Xi,i) 



max(Xi,,) - mm{Xih) + 0.01 

Yih - inm{Yih) 

max(r,,,) - mm{Y,h) + 0.01 ' 



• npoints + 0.5 
npoints + 0.5 



with extrema computed over the sample realizations, being npoints the desired number of 
points common to all items. Values were then rounded to obtain integer (ordinal) responses. 

Simulations were performed by considering 4, 5, 7 and 9 categories in the scales, which 
correspond to the situations commonly encountered in practice. We expect results from the 
PLS and the OPLS procedures to be quite similar in presence of 9 categories, since in this 
case polychoric correlations are close to their corresponding Pearson correlations. 

500 replications for each instance, each with 250 observations were made. 

To compare the performance of the two procedures we considered the empirical distri- 
bution of the inner model parameter estimate biases, see Tables [SpO] 

As expected (see Schneeweiss 1993 ) estimates obtained with the traditional PLS algo- 



rithm are negatively biased. Only for scales with 5, 7 and 9 categories we can observe about 
5% trials with a small or negligible positive bias for Normal distributed latent variables. The 
bias gets more evident with decreasing number of scale points. The behaviour is common 
both to Normal and Beta situations. With the OPLS about 10% simulations always present 
positive bias. Most percentage points of the bias distribution for the OPLS procedure are 
closer to than with traditional PLS. Averages biases are again closer to for the OPLS 
algorithm. 

Percentage points and average values are very close for the two estimation procedures in 
case of a 9 point scale. 

The ratio between the absolute biases observed in each trial with OPLS and the tradi- 
tional PLS was considered, to better compare the two procedures. The distribution of the 
ratios is shown in the third sections of Tables 3p0 giving evidence that over 90% trials have 
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an absolute bias of the OPLS lower than the traditional PLS, when scales are characterized 
by 4 and 5 points. By comparing the 5% and 95% percentage points for the distributions 
of ratios of absolute biases in case of the Normal assumption with 4 point scales, we can 
observe the better behaviour of the OPLS: for parameter 722 we have 5% and 95% absolute 
ratios 0.0728 and 3.8032. According to the latter value 5% trials have an absolute bias in 
OPLS estimates larger more than 3.8 times that of traditional PLS. According to the former 
value 5% trials have an absolute bias of traditional PLS larger more than 1/0.0728 = 13.7 
times than OPLS. 

Geometric means have been computed to summarize ratios between absolute biases of 
OPLS and traditional PLS and in all situations (except for 711, 9 points, Beta distribution) 
they are lower than 1. Their values increase with increasing number of scale points and get 
close to 1 in presence of scales with 9 points and asymmetric Beta distribution of the latent 
variables. 

In Section [6. 4| we reminded how the reduction in the bias attained by OPLS, pertaining 
the inner model parameter estimates, can have as a drawback an increase in the bias of the 
outer model A parameter estimates. The bias is evident if we examine Figures 6][T3 which 



report Box & Whiskers plots for the distribution of the bias of the coefficients estimates /3y 
and A from their theoretical values and the distribution of the weights Wij under the Normal 
and Beta assumptions for scales with 4 and 9 points. 

However, as we have already remarked, the role played by outer parameters is less im- 
portant than that of the inner model parameters: when making decisions based on PLS 
results the weights Wij are used instead of outer parameters; we remember that PLS define 



each latent variable as a 'composite' of its manifest indicators, see (10 1 and (22), and the 
weights give information about the strength of the relationship of each 'composite' across 
its manifest indicators. According to the Box & Whiskers Plots the estimates of the weights 
seem to be always characterized by a lower variability (interquartile range) when obtained 
with the OPLS algorithm. 

8 Conclusions and Final Remarks 

A PLS algorithm dealing with variables on ordinal scales has been presented. The algorithm, 
OPLS, is based on the use of the polychoric correlation matrix and seems to perform better 
than the traditional PLS algorithm in presence of ordinal scales with a small number of 
point alternatives, by reducing the bias of the inner model parameter estimates. 

A basic feature of PLS is the so-called soft modelling, requiring no distributional assump- 
tions on the variables appearing in the structural equation model. With the OPLS algo- 
rithm the continuous variables underlying the categorical manifest indicators are considered 



multinormally distributed. This can appear a strong assumption but, as Bartolomew (1996) 
observes, every distribution can be obtained as a transformation of the Normal one, which 
can suit most situations: for instance, in presence of a manifest variable with a negative 
asymmetric distribution, points on the right side of a scale will have the highest frequency 
and the underlying latent variable should also be of an asymmetric type, but transformation 
([8| will work anyway assigning larger intervals to the classes defined by the thresholds to 
which the highest points in the scale correspond. 

Furthermore polychoric correlations are expected to overestimate real correlations when 
scales present some kind of asymmetry, but this can be regarded as a positive feature for 
the OPLS algorithm. This may represent a correction of the negative bias characterizing 
PLS algorithms with regard to the estimates of the inner model parameters (which are in 
some way linked to correlations across manifest variables). 
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Tabic 3: Bias distribution of the inner model parameter estimates (4 points, Normal distri- 
bution) obtained with PLS and OPLS and distribution of the ratio between absolute values 
of the biases: percentage points, means and standard deviations 











5% 


10% 


25% 


50% 


75% 


90% 


95% 


mean 


sd 










PLS 


















7ii 




0. 


,9 


-0.166 


-0.158 


-0.144 


-0.125 


-0.107 


-0.094 


-0.087 


-0.126 


0.025 


722 




0. 


,5 


-0.128 


-0.118 


-0.095 


-0.067 


-0.039 


-0.019 


-0.004 


-0.068 


0.039 


723 




0. 


,6 


-0.147 


-0.131 


-0.110 


-0.084 


-0.056 


-0.033 


-0.021 


-0.083 


0.038 


P21 







.5 


-0.131 


-0.119 


-0.098 


-0.072 


-0.046 


-0.023 


-0.010 


-0.072 


0.038 


PZ2 







.6 


-0.164 


-0.149 


-0.115 


-0.083 


-0.050 


-0.022 


-0.006 


-0.084 


0.049 










OPLS 


















711 




0. 


9 


-0.111 


-0.103 


-0.087 


-0.070 


-0.052 


-0.039 


-0.027 


-0.070 


0.025 


722 




0. 


,5 


-0.101 


-0.090 


-0.065 


-0.036 


-0.004 


0.019 


0.035 


-0.035 


0.042 


723 




0. 


,6 


-0.111 


-0.095 


-0.072 


-0.044 


-0.014 


0.009 


0.023 


-0.044 


0.042 


hi 







.5 


-0.103 


-0.091 


-0.067 


-0.039 


-0.011 


0.016 


0.031 


-0.039 


0.042 


P32 







.6 


-0.138 


-0.111 


-0.077 


-0.044 


-0.010 


0.020 


0.036 


-0.046 


0.052 










Ratio of absolute 


biases OPLS over 


PLS 






geometric mean 


711 




0. 


,9 


0.329 


0.392 


0.465 


0.557 


0.613 


0.666 


0.693 


0.522 




722 




0. 


,5 


0.073 


0.166 


0.376 


0.594 


0.755 


1.090 


3.803 


0.531 




723 




0. 


,6 


0.113 


0.182 


0.385 


0.577 


0.697 


0.792 


0.982 


0.483 




P2I 







.5 


0.100 


0.207 


0.414 


0.621 


0.747 


0.914 


2.559 


0.543 




PZ2 







.6 


0.112 


0.244 


0.436 


0.606 


0.736 


0.911 


3.437 


0.575 





Table 4: Bias distribution of the inner model parameter estimates (5 points. Normal distri- 
bution) obtained with PLS and OPLS and distribution of the ratio between absolute values 
of the biases: percentage points, means and standard deviations 











5% 
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25% 


50% 
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90% 


95% 


mean 


sd 










PLS 


















711 




0. 


,9 


-0.145 


-0.136 


-0.123 


-0.105 


-0.092 


-0.078 


-0.070 


-0.107 


0.023 


722 




0. 


,5 


-0.120 


-0.106 


-0.079 


-0.057 


-0.029 


-0.008 


0.004 


-0.056 


0.037 


723 




0. 


6 


-0.126 


-0.111 


-0.095 


-0.071 


-0.046 


-0.027 


-0.015 


-0.071 


0.035 


P21 







.5 


-0.114 


-0.104 


-0.086 


-0.060 


-0.035 


-0.013 


0.010 


-0.060 


0.038 


1332 







.6 


-0.150 


-0.134 


-0.103 


-0.071 


-0.039 


-0.011 


0.007 


-0.072 


0.048 










OPLS 


















711 




0. 


,9 


-0.110 


-0.098 


-0.085 


-0.069 


-0.054 


-0.043 


-0.034 


-0.070 


0.023 


722 




0. 


,5 


-0.100 


-0.088 


-0.060 


-0.035 


-0.005 


0.018 


0.027 


-0.034 


0.039 


723 




0. 


,6 


-0.103 


-0.090 


-0.071 


-0.046 


-0.018 


0.003 


0.014 


-0.045 


0.036 


/321 







.5 


-0.095 


-0.085 


-0.066 


-0.040 


-0.013 


0.010 


0.036 


-0.038 


0.040 


1332 







.6 


-0.131 


-0.110 


-0.079 


-0.046 


-0.010 


0.017 


0.033 


-0.046 


0.049 










Ratio of absolute 


biases OPLS over 


PLS 






geometric mean 


711 




0. 


,9 


0.459 


0.511 


0.590 


0.651 


0.704 


0.751 


0.772 


0.629 




722 




0. 


,5 


0.107 


0.220 


0.509 


0.700 


0.823 


1.670 


4.287 


0.641 




723 




0. 


,6 
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Tabic 5: Bias distribution of the inner model parameter estimates (7 points, Normal distri- 
bution) obtained with PLS and OPLS and distribution of the ratio between absolute values 
of the biases: percentage points, means and standard deviations 
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Table 6: Bias distribution of the inner model parameter estimates (9 points. Normal distri- 
bution) obtained with PLS and OPLS and distribution of the ratio between absolute values 
of the biases: percentage points, means and standard deviations 
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Tabic 7: Bias distribution of the inner model parameter estimates (4 points, Beta distribu- 
tion) obtained with PLS and OPLS and distribution of the ratio between absolute values of 
the biases: percentage points, means and standard deviations 
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Table 8: Bias distribution of the inner model parameter estimates (5 points, Beta distribu- 
tion) obtained with PLS and OPLS and distribution of the ratio between absolute values of 
the biases: percentage points, means and standard deviations 
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Tabic 9: Bias distribution of the inner model parameter estimates (7 points, Beta distribu- 
tion) obtained with PLS and OPLS and distribution of the ratio between absolute values of 
the biases: percentage points, means and standard deviations 
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Table 10: Bias distribution of the inner model parameter estimates (9 points, Beta distri- 
bution) obtained with PLS and OPLS and distribution of the ratio between absolute values 
of the biases: percentage points, means and standard deviations 
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Figure 6: Parameter estimates bias and weights distribution (4 points, Normal distribution) 
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Figure 7: Parameter estimates bias and weights distribution (5 points, Normal distribution) 
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Figure 8: Parameter estimates bias and weights distribution (7 points, Normal distribution) 
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Figure 9: Parameter estimates bias and weights distribution (9 points, Normal distribution) 
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Figure 10: Parameter estimates bias and weights distribution (4 categories, Beta distribu- 
tion) 
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Figure 11: Parameter estimates bias and weights distribution (5 categories, Beta distribu- 
tion) 
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Figure 12: Parameter estimates bias and weights distribution (7 categories, Beta distribu- 
tion) 
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Figure 13: Parameter estimates bias and weights distribution (9 categories, Beta distribu- 
tion) 
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The gain in the bias reduction is less evident for scales with an higher number of cate- 
gories, for which polychoric correlation values are closer to Pearson's correlations. In these 
cases ordinal scales can be considered as they were of the interval type, possibly according 
to the so-called pragmatic approach to measurement ( Hand 2009 1 . 

Increasing the number of the points of the scale can help the performance of the tradi- 
tional PLS algorithm when the scale is interpreted as continuous, but it often happens that 
in presence of asymmetric distributions many points of the scale are characterized by a very 
low response frequency, since the number of points that respondents do effectively choose 
may be quite restricted. Thus the administered scale corresponds to a scale with a lower 
number of points and OPLS can anyway be useful in these situations. 

Another important feature of the PLS predictive approach is the direct estimation of 
latent scores. With the OPLS algorithm we can estimate some thresholds for the latent 
variables, from which a 'category' indication for the ordinal latent variable follows according 
to one of the 3 estimation methods presented in Section [6. 3[ 

Simulations have been carried out to assess the properties of the algorithm also in pres- 
ence of asymmetric distributions for latent variables and the bias characterizing the inner 
model parameter estimates obtained with the traditional PLS algorithm was reduced. 



Further research will consider a comparison with the Optimal Scaling techniques (Mair 



& de Leeuw 20101 that were proposed within the PLS framework by Nappo (20091 and 



Lauro et aL (2011). 
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